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[57] ABSTRACT 

This invention is an adaptive neuron for use in neural 
network processors. The adaptive neuron participates 
in the supervised learning phase of operation on a co- 
equal basis with the synapse matrix elements by adap- 
tively changing its gain in a similar manner to the 
change of weights in the synapse io elements. In this 
manner, training time is decreased by as much as three 
orders of magnitude. 

19 Claims, 5 Drawing Sheets 
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NEURAL NETWORK WITH DYNAMICALLY 
ADAPTABLE NEURONS 

ORIGIN OF THE INVENTION 5 

The invention described herein was made in the per- 
formance of work under a NASA contract, and is sub- 
ject to the provisions of Public Law 96-517 (35 USC 
202) in which the Contractor has elected not to retain 
title. 1° 

This application is a continuation of application Ser. 
No. 07/905,061, filed Jun. 24, 1992, now abandoned 
which is a continuation of application, Ser. No. 
07/473,024, filed Jan. 31, 1990, now abandoned. 
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TECHNICAL FIELD 

The invention relates to neural networks and proces- 
sors and, more particularly in a neural network employ- 
ing a plurality of neurons connected to a network of 
conductors which are selectively interconnected by a 20 
plurality of problem -defining synapses each having an 
adjustable weighting factor wherein the network is 
parameterized in an iterative learning process in which 
problem-defining inputs to the neurons and the error 
signal between actual outputs from the network and 25 
expected output from the network are used to incre- 
mentally change the weighting factors, the improve- 
ment for reducing the number of iterations required 
from the network to learn how to solve a problem of 
interest comprising in each neuron between an input 30 
thereof for receiving at least one input signal and an 
output thereof for outputting an output signal value, 
including a neural conductive element having a variable 
gain and gain adjustment logic for dynamically adjust- 
ing the variable gain as a function of present values of 35 
the instantaneous error during a learning process of the 
neural network. 

In the preferred embodiment, the gain adjustment 
logic makes incremental changes in the gain which are 
proportional to the negative of the derivative of the 40 
instantaneous error with ‘respect to the gain. Further in 
the preferred embodiment, the incremental change AT 
for the \ th neuron on a layer n can be given as: 



where tj where is the temperature learning rate, which 
is a pre-established constant. 

BACKGROUND ART 


50 


In the field of neural network processors, there is 
currently much interest as the ability to create large 
processors in a small space using VLSI techniques has 
suddenly made these analog techniques viable for pur- 55 
poses such as solving complex problems in real time. 
Problems such as best path location are quickly solved 
using the analog approach of a neural network whereas 
the same problem would be both hardware and time 
intensive if approached using digital computation facili- 60 
ties in a parallel processing manner. 

While a neural network processor does its computa- 
tions rapidly, it does have one drawback relative to the 
digital approach. With a digital computer, the problem 
to be solved is described as a series of precise logical 65 
steps to be accomplished. These steps are programmed 
into a series of computer instructions which are then 
loaded into the computer or computers which are to 
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perform the calculations. At run-time, the input parame- 
ters for the problem to be solved are provided to the 
computer(s) and the problem instructions are executed 
to provide an answer to the problem. There are no 
possible variation; that is, a digital computer operates as 
a brute force automaton executing the sequence of in- 
structions provided. If anything happens which is not 
provided for in the instruction sequence, the program 
fails and no meaningful results are provided. 

The neural network, on the other hand, is patterned 
after the human brain and must be taught by a learning 
process. A typical prior art neural network can appear 
in basic form as depicted in FIG. 1 where it is generally 
indicated as 10. The network 10 comprises a matrix of 
conductors 12 to which a number of neurons 14 are 
connected to provide inputs. The conductors 12 are 
interconnected by a number of synapses 16 defining the 
general rules of the problem to be solved. The neurons 
14 are generally non-linear elements having an input 18 
to which an analog signal representing a variable of the 
problem to be solved can be connected. The outputs 20 
of the neurons 14 are connected to the conductors 12 on 
one side of the synapses 16. The conductors 12 of the 
other side of the synapses 16 provide the outputs 22 of 
the network 10 representing the solution (in analog 
form) to the problem being solved. 

A complex problem may take the form of a multi- 
layer neural network such as that generally indicated as 
10' in FIG. 2. In such a multi-layer neural network 10', 
the analog inputs defining the problem are input as a 
first set of neurons 14. The outputs from the first layer 
of the network 10' are input to a second set of neurons 
14 at the input to the second layer of the network 10' 
and the outputs representing the solution to the problem 
are found at the outputs 22 from the second layer of the 
network 10'. 

A neural network (single layer 10 or multi-layer 10') 
“learns” by experience. A first set of variables for the 
problem to be solved are input to the network 10, 10' 
and the outputs 22 representing the solution to the prob- 
lem for the given parameters are inspected. Each of the 
synapses 16 is then adjusted as to its performance fac- 
tors (weight) in the total problem. As a result of the 
“answer” provided, the weights of the synapses 16 are 
adjusted slightly, as necessary, in a manner which will 
tend to move the answer to the problem closer to the 
correct answer. This process is repeated over and over 
with a second, third, etc. set of variables until the synap- 
ses 16 have all been adjusted to the point where the 
proper answer to the problem is given for any set of 
variables which are input. At that point, the network 
10,10' has learned by experience how to solve the par- 
ticular problem. 

One of the current issues in the theory of supervised 
learning concerns the scaling properties of neural net- 
works. While low-order neural computations are easily 
handled on sequential or parallel processors, the treat- 
ment of high-order problems proves to be intractable. 
The computational burden involved in implementing 
supervised learning algorithms, such as back propaga- 
tion, on networks with large connectivity or lo training 
sets is immense and impractical. Until the development 
of fully parallel hardware, the treatment of such appli- 
cations as image recognition or pattern classification 
prove unwieldy to handle current algorithms. It is clear, 
therefore, that a more computationally efficient learn- 
ing rule is required to deal with such applications. 
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Current neuromorphic models regard the neuron as a 
strictly passive non-linear element and the synapse, on 
the other hand, as the primary source of information 
processing insofar as “learning” is concerned. In these 
standard prior art models, information processing is 
performed by propagating within a network synapti- 
cally weighted neuronal contributions in either a feed- 
forward, feed-backward, or fully recurrent fashion. 
Information is contained in the synaptic weights and 
their rapid evaluation form the goal of these algorithms. 
Prior art artificial neural networks take the point of 
view that the neuron can be modeled by a simple non- 
linear “wire” type of device. The only prior art imple- 
mentation of any adjustability to the neurons is depicted 
in FIG. 3. As depicted therein, the neurons 14 (i.e. im- 
plemented as non-linear elements as described above) 
can be manually adjusted en masse (as indicated by the 
dashed box 24) by means of a manual input device 26 
under the control of a human operator. The neurons 14 
each have an adjustable gain which is referred to in the 
art as the “temperature” of the neuron. Typically, the 
learning process begins with the temperature (i.e. the 
gain) of the neurons 14 at a high level. As the learning 
process proceeds, the operator may periodically and 
randomly begin to lower the temperature of all the 
neurons 14 simultaneously by means of the manual input 
device 26. This results in some measurable and meaning- 
ful improvement in the learning process of the network; 
that is, there is some decrease in learning time when the 
operator adjusts the temperature of the neurons 14. 

Substantial evidence is beginning to emerge and be 
reported that information processing apparently occurs 
in biological neural networks (e.g. the human brain) at 
the neuronal level. If this is true, one can suppose that 
by providing a dynamically adaptable neuron element 
for use in artificial neural networks which adapts as part 
of the learning process along with the synapses, the 
learning process can be improved and the time therefor 
decreased significantly. 

STATEMENT OF THE INVENTION 

Accordingly, it is an object of this invention to pro- 
vide a dynamically adaptable neuron element for use in 
artificial neural networks which adapts as part of the 
learning process along with the synapses. 

It is another object of this invention to provide a 
dynamically adaptable neuron element for use in artific- 
ial neural networks which improves the learning pro- 
cess and decreases the time therefor significantly. 

Other objects and benefits of this invention will be- 
come apparent from the detailed description which 
follows hereinafter when taken in conjunction with the 
drawing figures which accompany it. 

BRIEF DESCRIPTION OF THE DRAWINGS 

FIG. 1 is a simplified drawing depicting a basic prior 
art neural network comprised of neurons and synapses. 

FIG. 2 is a simplified drawing depicting a basic prior 
art multi-level neural network. 

FIG. 3 is a simplified drawing depicting a basic prior 
art neural network comprised of neurons and synapses 
and indicating how the common temperature of the 
neurons can be' regulated manually during the net- 
work’s learning process. 

FIG. 4 is a simplified drawing depicting a multi-level 
neural network according to the present invention com- 
prised of synapses and adaptive neurons in which the 
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temperature of each neuron is dynamically regulated 
during the network’s learning process. 

FIG. 5 is a simplified representation of one adaptive 
neuron according to the present invention and depicting 
5 the function whereby the temperature of the neuron is 
dynamically regulated during the network’s learning 
process. 

FIG. 6 is a graph of the activation function f shown 
plotted against the input s to an adaptive neuron accord- 
10 ing to the present invention as depicted in FIG. 5 for 
several different temperatures. The curves shown are 
for temperatures T ranging from 0.2 to 2.0 in increments 
of 0.2 with the sharpest activation curve being for 
T=0.01 and showing the high sensitivity of f with re- 
15 spect to T. 

FIGS. 7 and 8 are simplified representations of the 
architecture of the neural network used for testing the 
adaptive neuron model on the XOR problem as dis- 
cussed herein by way of example with FIG. 7 showing 
20 the randomized synaptic weights and neuronal tempera- 
tures prior to training and FIG. 8 showing these same 
weights after training. 

FIG. 9 is a graph showing a comparison of training 
statistics between the adaptive neuron model and the 
25 standard back propagation model with the error during 
training plotted against the training iteration number. 

FIG. 10 is a companion graph to the graph of FIG. 9 
showing the sluddard deviation versus iteration number 
for the adaptive neuron model and the standard back 
30 propagation model. 

FIG. 11 is another graph showing a comparison of 
training statistics between the adaptive neuron model 
and the standard back propagation model with the error 
during training plotted against the training iteration 
35 number for the training case where the adaptive neuron 
model escapes a local minima. 

FIG. 12 is a companion graph to the graph of FIG. 11 
showing the standard deviation versus iteration number 
for the adaptive neuron model and the standard back 
40 propagation model. 

DETAILED DESCRIPTION OF THE 
INVENTION 

A multi-level neural network 10"' according to the 
45 present lo invention is depicted in FIG. 4. The principle 
difference between the network 10 "' and the prior art 
networks 10 , 10 ' described above is the substitution of 
adaptive neurons 14' according to this invention for the 
standard, prior art neurons 14 described above. As 
50 noted in the drawing figure, each neuron 14' has its own 
temperature (Ti) associated with it. While a particular 
approach to implementing the adaptive neurons 14' is to 
be described in detail hereinafter and that particular 
implementation is that which was modeled and for 
55 which the results are reported, it should be recognized 
and appreciated by those skilled in the art that the basic 
point of novelty of this invention is the use of adaptive 
neurons within a neural network. It is this breadth 
which should be accorded the present invention as 
60 described and claimed hereinafter. 

A neuron 14' according to the present invention in its 
preferred (and modeled) embodiment depicting the 
temperature adjustment algorithm employed is shown 
in FIG. 5. The neuron 14' on layer n receives its inputs 
65 from the outputs from the previous layer, n— 1, and 
outputs its own output as a dynamically adjustable func- 
tion of its temperature. Actually, the functional adjust- 
ment of the neuron 14' in its preferred embodiment is 
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analogous to the functional adjustment of the synapses 
16; that is, the adjustment of the synapses 16 is what can 
best and most accurately be described as a gradient 
decent in weight space. The preferred adjustment to the 
adaptive neurons 14' is a gradient decent in temperature 5 
space. This concept will now be addressed in more 
detail. 

As mentioned earlier herein, the basic approach of 
this invention is the extending of the “learning” process 
to the neuron rather than the prior art approach of using io 
the neuron as a mere conduit to the synapses which 
have the total learning responsibility. Such an extension 
can only be made after first defining what constitutes 
learning at the neuronal level. The adaptive neuron 14' 
can then be seen to provide an additional or secondary 15 
source of information processing and knowledge reten- 
tion. This is achieved in this invention by treating both 
the neuronal and synaptic variables as optimization 
parameters. Both the amplitude and temperature (i.e. 
gain) of the sigmoid function represent such neuronal 20 
parameters. In much the same way that the synaptic 
interconnection weights (W) require optimization to 
reflect the knowledge contained within the training set, 
so should the temperature (T) term be optimized. It 
should be emphasized that the method of this invention 25 
does not necessarily optimize a global neuron parameter 
for the network; but rather, can either allow each neu- 
ron to have its own characteristic local value, or can 
determine a global one. In either case, however, the 
neuron parameters are dynamically adjustable as a func- 30 
tion of the progress of the learning process and not 
independently and manually adjustable upon an associ- 
ated physical basis as in the prior art case described 
above with respect to FIG. 3. It should be noted at this 
point that if the number of neurons in a network goes as 35 
O(N), then the number of synaptic connections typi- 
cally increases as 0(N 2 ). The fact that the activation 
function is extremely sensitive to small changes in tem- 
perature or amplitude and that there are far fewer neu- 
ronal parameters to update than synaptic weights sug- ^ 
gests that if indeed the temperature optimization 
scheme works, a significant reduction in convergence 
time is expected over standard back propagation. 

Although the principle of neuronal optimization of 
this invention is an entirely general concept (and there- 45 
fore applicable to any learning scheme), the popular 
feed-forward back-propagation learning rule was se- 
lected by the inventor herein for modeling implementa- 
tion and performance testing. It is this particular tested 
implementation which is to be described hereinafter. It 50 
is the inventor’s intent that the invention as described 
herein not be limited in any way because of the use of a 
particular example and implementation. 

Back propagation is an example of supervised learn- 
ing where, for each presentation consisting^ of an input 55 
vector o‘P and its associated target vector tP, the algo- 
rithm attempts to adjust the synaptic weights so as to 
minimize the sum-squared error E over all patterns p. 
The learning rule treats the interconnection weights as 
the only variable and consequently executes gradient ^ 
descent in weight space. The error is given by 

E « 2 E p = I 2 2 foP - o??) 2 

P P ' 

65 

The quantity Xf is the \ th component of the p th desired 
output vector pattern and o fP is the activation of the 
corresponding neuron in the final layer n. For nota- 
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tional ease the summation over p is dropped and a single 
pattern is cnosidered. On completion of learning, the 
synaptic weights capture the transformation linking the 
input to output variables. One major drawback of this 
algorithm when implemented with prior art neurons is 
the excessive convergence time. It was expected by the 
inventor herein that a significant decrease in conver- 
gence time could be realized provided that the neurons 
were allowed to participate in the learning process. To 
accomplish this objective, each neuron in the network is 
characterized by a set of parameters whose values are 
optimized according to some rule, and not in a heuristic 
fashion as in simulated annealing. Upon training com- 
pletion, learning is thus captured in both the synaptic 
and neuronal parameters. The scheme as implemented 
involves assigning a temperature, T, to each neuron, as 
seen in the activation function. These parameters are to 
be optimized so as to significantly decrease the net- 
work’s learning time. Learning, therefore, is now cap- 
tured in both the synaptic weights and neuronal param- 
eters. The activation of a unit— say the \ th neuron on the 
m* A laye — is given by of”. This response is computed by 
a non-linear operation on the weighted responses of 
neurons from the previous layer, as seen in FIG. 5. A 
common function to use is the logistic function, 


1 4 . e - 0sim 

and T = 1/(3 is the temperature of the network. The net 
weighted input to the neuron is found by summing 
produces of the synaptic weights and corresponding 
neuronal outputs from units on the previous layer, 

s, m = 1 wV-'o?-' 

where o /” -1 represents fan— in units and the w ,/" -1 the 
pairwise conenction strength between neuron i in layer 
m and neuron j in layer m— 1. Furthermore, the particu- 
lar rule chosen to optimize these neuronal parameters is 
based on gradient descent in the sum squared error, E, 
in temperature space. 

Therefore, the incremental change in temperature 
term must be proportional to the negative of the deriva- 
tive of the error term with respect to temperature. Fo- 
cussing on the I th neuron on the layer n, the incremental 
change can be given as: 



where 7] is the temperature learning rate, which is a 
pre-established constant. This equation can be expressed 
as the product of two terms by the chain rule, 

dE _ dE *0/ 
bTf bof bTf 

Substituting expressions and leaving the explicit func- 
tional form of the activation function unspecified, i.e. 
o/"=f(Ti„ w , . . . ) we obtain, 
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iTf bTf 


In a similar fashion, the temperature update equation for - 
the previous layer is given by, 




*n~ ] 


10 


Using the chain rule, this can be expressed as, 


bE = r dE d0 k 1 

a7 J" 1 ~ / iof dsf tol* 1 07J- 1 


Substituting expressions and simplifying reduces the 
above to, 


dE 

S7T 1 



- [// - of) 


dsf 


W lk 
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By repeating the above derivation for the previous 25 
layer, i.e. determining the partial derivative of E with 
respect to T ^“ 2 etc., a simple recursive relationship 
emerges for the temperature terms. Specifically, the 
updating schene for the k th neuronal temperature on the 
m lb layer is given by, - n 


A 7* m 


-7) 


SE 


where. 


dE 

*T k m 



35 


In the above expression, the error signal 6 k m takes on 
the value, 40 


6k m =[tk-o k m ] 


if neuron m lies on an output layer, or, 


s k m = 2sr + ' 
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if the neuron lies on a hidden layer. 

The foregoing algorithm was tested by applying it to 50 
logic lo problems. The network (in the fashion of the 
network 10 " of FIG. 4 including adaptive neurons 14 ') 
was trained on a standard benchmark employed in the 
art for such purposes, i.e. the exclusive-or (XOR) prob- 
lem. It was useful to test the algorithm employed in the 55 
adaptive neurons of the present invention on such a 
problem since it is the classic problem requiring hidden 
units and since many problems involve an XOR as a 
subproblem. The application of the proposed learning 
rule involved two phases. In the first phase, an input 60 
pattern was presented and propagate forward through 
the network to compute the output values o. This out- 
put was then Lk compared to its target value, resulting 
in an error signal for each output unit. The second phase 
involved a backward pass through the network during 65 
which the error signal was passed along the network 
and the appropriate weight and temperature changes 
made. Note that in addition to the synapses and neurons 
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having their own learning rates 7]and rj , respectively, an 
additional degree of freedom can be introduced, if de- 
sired. This is accomplished by allowing for relative time 
scales for weight and temperature updates, i.e. 7 ^ and 
rr, respectively. This allows a particular simulation to 
place varying emphasis on either of the two updating 
schemes. In other words, the weight and temperature 
changes do no necessarily have to occur simulta- 
neously. The two are truly autonomous and change 
independently as necessary for the network to learn in 
an optimum manner. Thus, what we have described is a 
gradient descent method for finding both weights and 
temperatures in any feed forward network. 

In deriving the learning rule for temperature optimi- 
zation as implemented in the present invention, the 
derivative of the activation function of a neuron played 
a key role. The inventor has used a sigmoidal type of 
function in his simulations whose explicate form is de- 
picted in FIG. 5 and shown by the graph of FIG. 6 to be 
extremely sensitive to small changes in temperature. 
The sigmoid is shown plotted against the net input to a 
neuron for temperatures ranging from 0.2 to 2.0, in 
increments of 0.2. However, the stee-pest curve was for 
a temperature of 0.01. The derivative of the activation 
function taken with respect to the temperature is given 

by, 



Sk m 

T^ 2 (1 + e~P kmskm ) 2 


As shown in FIGS. 7 and 8 , the XOR architecture 
selected has two input units, two hidden units, and a 
single output unit. Each neuron is characterized by a 
temperature, and neurons are connected by weights. 
Prior to training the network, both the weights and 
parameters were randomized. The initial and final opti- 
mization parameters for a sample training exercise are 
shown in the drawing figures. Specifically, FIG. 7 
shows the values of the randomized weights and tem- 
peratures prior to training and FIG. 8 shows their val- 
ues after training the network for 1,000 iterations. This 
is a case where the network has reached a global mini- 
mum. In both figures, the number associated with the 
dashed arrows represent the thresholds of the neurons, 
and the numbers written next to the solid arrows repre- 
sent the excitatory/inhibitory strengths of the pairwise 
connections. 

To fully evaluate the convergence speed of the pres- 
ent invention versus the prior art, a benchmark compar- 
ison between the adaptive neuron model (ANM) ap- 
proach described above and plain back propagation 
(BP) was made. In both cases, the training was started 
with identical random synaptic weights lying within the 
range [— 2 . 0 , + 2 . 0 ] and the same synaptic weight learn- 
ing rate tj= 0.1. The temperature of the neurons in the 
adaptive model were randomly selected to lie within 
the narrow range of [0.9, 1.1] and the temperature learn- 
ing rate 7) set at 0.1. FIGS. 9 and 10 summarize the 
training statistics of this comparison. In FIG. 9 , the 
error is plotted against the training iteration number. In 
FIG. 10, the standard deviation of the error over the 
training set is shown plotted against the training itera- 
tion. In the first few hundred training iterations in FIG. 
9 , the performance of BP and ANM is similar and ap- 
pears as a broad shoulder in the curve. Recall, however, 
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that both the weights and temperatures were random- 
ized prior to training and are, therefore, far from their 
final values. As a consequence of the low values of the 
learning rates used, the error is large and will only begin 
to get smaller when the weights and temperatures begin 5 
to fall in the right domain of values. In the ANM, the 
shoulder terminus is marked by a phase-transition like 
discontinuity in both error and standard deviation, For 
the particular example shown, this occurred at the 
637th iteration. A several order of magnitude drop in 1° 
the error and standard deviation is observed within the 
next ten iterations. This sharp drop off is followed by a 
much more gradual decrease in both the error and stan- 
dard deviation. 

In learning the XOR problem using standard BP, it 15 
has been observed that the network frequently gets 
trapped in local minima. In FIGS. 11 and 12 we observe 
such a case as shown by the dashed line. In numerous 
simulations on this problem, the inventor herein has 
determined that the ANM approach of this invention is 20 
much less likely to become trapped in local minima as 
depicted by the solid line in the same figures. 

Finally, it can be reported that the simulation tests 
run by the inventor herein provide preliminary results ^ 
that indicate that the adaptive neuron approach as de- 
scribed above can significantly out perform prior art 
back propagation, for example, by reducing the learning 
time by several orders of magnitude. Specifically, the 
XOR problem was learned to a very high precision by 3Q 
the ANM network in about 10 3 training iterations with 
a mean square error of about 10“ 6 versus over 10 6 itera- 
tions with a corresponding mean square error of about 
10“ 3 for the prior art BP approach to the same problem. 

Wherefore, having thus described my invention, 35 
what is claimed is: 

1. In a neural network employing a plurality of neu- 
rons each associated with a respective conductor in a 
network of conductors which are selectively intercon- 
nected by a plurality of problem-defining synpases each 4 0 
having an adjustable weighting factor whereby each 
neuron receives a weighted sum of inputs from plural 
conductors of a previous layer of said network and 
produces an output to a conductor in a following layer 
of said network, wherein the neural network is parame- 45 
terized in an iterative learning process in which prob- 
lem-defining signals are input to the neurons whereby to 
produce error signals between actual outputs from the 
network and expected outputs from the network and 
said error signals are used to incrementally change each 50 
weighting factor, an improvement for reducing the 
number of iterations required from the neural network 

to learn how to solve a problem of interest comprising: 

in each neuron between an input thereof for receiving 
said weighted sum of inputs and an output therefor 55 
for outputting an output signal value, including a 
neural conductive element having a variable gain 
defining said output signal value as a function of (a) 
said variable gain and (b) said weighted sum of 
inputs and gain adjustment logic means for dynami- 60 
cally adjusting the variable gain of the neuron inde- 
pendently of the other neurons in the network 
during a learning process of the neural network. 

2. The improvement to a neural network of claim 1 

wherein: 65 

said gain adjustment logic means dynamically adjusts 
said variable gain as a function of an instantaneous 
error signal. 
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3. The improvement to a neural network of claim 2 
wherein: 

said gain adjustment logic means makes incremental 
changes in said variable gain which are propor- 
tional to the negative of the derivative of said in- 
stantaneous error signal with respect to said vari- 
able gain. 

4 . The improvement to a neutral network of claim 3 
wherein an incremental change AT for an 1 th neuron on 
a layer n can be given as: 

a t'ji a£ 

AT / 1 = — 7 ] 

ZTf 

where 17 is the temperature learning rate, which is a 
pre-established constant and E is the sum of said instan- 
taneous error signals. 

5. A neural network comprising: 

a plurality of signal paths; 

a plurality of synapses each comprising means for 
coupling respective pairs of said signal paths to- 
gether, each of said synapses having a signal 
weighting factor which is adjusted during an itera- 
tive learning process of said neural network; and 

a plurality of neurons each comprising means for 
providing a respective gain in respective ones of 
said signal paths and for individually adjusting said 
respective gain independently of the other neurons 
in the network during an iterative learning process 
of said neural network. 

6. The neural network of claim 5 wherein the learning 
process of said synapses and the learning process of said 
neurons are the same learning process whereby the 
weighting factors of each of said synapses and the gains 
of each of said neurons are adjusted simultaneously in 
said same iterative learning process of said neural net- 
work. 

7. The neural network of claim 5 wherein the learning 
process of said synpases and the learning process of said 
neurons are independent learning processes, wherein 
changes to said gains and changes to said weights are 
not simultaneous. 

8. The neural network of claim 5 wherein said neural 
network comprises first and second layers, each of said 
layers comprising a respective set of said signal paths 
and a respective set of said synapses which interconnect 
the signal paths in said first layer, the signal paths of said 
first layer having outputs connected to inputs of respec- 
tive ones of said neurons and the signal paths of said 
second layer having inputs connected to said respective 
neurons. 

9. The neural network of claim 8 wherein each neu- 
ron provides a summation node between (a) outputs of 
signal path connected together by respective synapses 
in said first layer and (b) an input of respective signal 
path in said second layer. 

10. The neural network of claim 8 wherein others of 
said neurons have their outputs connected to inputs of 
respective signal paths in said first layer. 

11. The neural network of claim 5 wherein said acti- 
vation function is a sigmoid function having a slope 
which determines said gain. 

12. The neural network of claim 11 wherein said 
function of said error signals comprise the negative 
derivative of said error signal with respect to said gain. 

13. The neural network of claim 5 wherein said itera- 
tive training process comprises applying a set of known 
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signals to inputs of selected ones of said signal paths and 
producing error signals corresponding to differences 
between predetermined values and signals at outputs of 
certain ones of said signal paths, and wherein the gain of 
each of said neurons is adjusted as a function of said 5 
error signals. 

14. A method for training a neural network having a 
plurality of signal paths, a plurality of synapses each 
coupling a respective pair of said signal paths together, 
each of said synapses having a signal weighting factor 10 
and a plurality of neurons each providing a respective 
gain in a respective one of said signal paths, said method 
comprising performing an iterative learning process 
while: 

adjusting the weighting factors of said synapses; and 15 

individually adjusting the respective gains of each of 
said neurons independently of the other neurons in 
the network. 

15. The method of claim 14 wherein each iterative 
learning process comprises a gradient descent method. 20 

16. The method of claim 14 wherein the learning 
process of said synapses and the learning process of said 
neurons are the same learning process whereby the 


weighting factors of each of said synapses and the gains 
of each of said neurons are adjusted simultaneously in 
said same iterative learning process of said neural net- 
work. 

17. The method of claim 14 wherein the learning 
process of said synapses and the learning process of said 
neurons are independent learning processes, wherein 
changes to said gains and changes to said weights are 
not simultaneous. 

18. The method of claim 14 wherein each iterative 
training process comprises applying a set of known 
signals to inputs of selected ones of said signal paths and 
producing error signals corresponding to differences 
between predetermined values and signals at outputs of 
certain ones of said signal paths, and wherein said itera- 
tive learning process for said neurons comprises adjust- 
ing the gain of each of said neurons as a function of said 
error signals. 

19. The method of claim 18 wherein said function of 
said error signals comprises a gradient of said error 

signals with respect to said gain. 

♦ * * * * 
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